Hey, everybody. Thanks for coming out. Okay. Well, I got a lot of slides. So I'm just trying
to just burn through them. We're just going to power through. So try to pay attention
to like the first five minutes of slides so that, you know, you'll be there with me when
we're hitting through this stuff. Okay. So I'm Brandon Wiley. I've done some stuff. I
wrote a thing called Freenet in like 2000, DEF CON 2000. Oh, thank you. Raise your hand
if you have ever run a Freenet node. Yeah, my people. Thank you. National heroes, every
one of you. So, yeah, my first talk ever, I was 18 years old. It was at DEF CON in 2000.
I presented about Freenet. The entire description of my talk was this is about Freenet. I drew
the slides with crayons. And that was it. It was like a packed room of people that came
to go see.
Like a talk based on that information. And then at Black Hat, like 2003, I presented
Curious Yellow, which was my superworm design that was designed to destroy the Internet.
Purely theoretical, as you can tell because the Internet is still here. You can read more
about that in Charles Strauss has a book called Glass House in which Curious Yellow is the
thing that like destroys humanity. So that was a great moment for me when he put that
in there. And then I used to work at BitTorrent.
So, like, I was there when BitTorrent bought uTorrent. So I apologize for that. But, yeah,
I did a lot of stuff at BitTorrent. And then since then, when I was at BitTorrent is when
I first saw deep packet inspection being used to block BitTorrent. In fact, when BitTorrent
was ‑‑ when we noticed that Comcast was blocking BitTorrent before any of the press
heard about it, I was the guy that they sent to Comcast to try to reason with them. And,
Well, you know how that worked out.
So I started doing ‑‑ I've been working on kind of anonymity stuff and mainly kind
of in the censorship resistance side of things for a long time.
So I know the folks from Tor from back in the day and I've been helping them out more
recently with their new like obfuscated protocols because Tor is being blocked in a lot of places
so they need a new protocol that's not blocked.
And then finally I have ‑‑ I wrote part of a book called Peer to Peer for O'Reilly
like a long time ago.
So anyway, so those are my credentials.
Who cares?
Whatever.
I'm just putting this up so that I can establish some credibility with you guys so that when
I start showing you pictures of cats, you don't just be like, what is this?
I'm out of here.
Because there's a lot of pictures of cats in my talk.
So yeah.
Cool.
Cool.
All right.
So let's get into it.
So my slides are taken from two different sources.
One is my ‑‑.
I have a children's book on Internet freedom called Free as in Kitties.
And the other one ‑‑ the other slides are from my Ph.D. dissertation.
So I kind of meshed them together.
We'll see how it goes.
Right?
So we're going to start out with the Internet.
What is it?
Let's define some terms.
Hopefully you guys have checked it out.
If not, it's pretty cool.
Should get on there.
There's a lot of stuff on it, a lot of cats and stuff.
And then how do we Internet with this Internet once we know what an Internet is?
And then we just get straight up into just ‑‑.
Binary classifiers using Bayesian statistical inference.
That's from the children's book.
No.
And then fooling binary classifiers with polymorphic protocols.
And then, you know, dust, which is what it talks about, which is the polymorphic protocol
engine.
And then I got some infographics.
And then if we get ‑‑ if we have time ‑‑ I forgot to start my timer.
There we go.
Then we'll talk a little bit ‑‑ I want to talk a little bit about realistic threat
models versus the threat models that everybody else uses.
Okay.
So, yeah.
So first of all, the Internet.
The Internet, as we all know, is the greatest technological marvel of our time and the pinnacle
of civilization.
It's an unprecedented way to deliver pictures of cats.
So I know what you're thinking.
You can't take a real cat and transmit it over the Internet.
Believe me, I've tried.
It doesn't work.
That's an analog cat.
So first step is we have to turn it into pixels with what they call pixitization.
So we get it pixels and that's a digital form that we can transmit over the Internet.
So if we take this exact cat, we make it into pixels.
We have this.
It's a pixel cat.
Fun fact, if you go on Google image search and you're looking ‑‑ or just on Google
and you're looking for things like 8‑bit cat, pixel cat, low‑res cat, you'll find
a lot of OKCupid profiles of girls who live in Oakland.
It's a true story.
Okay.
Great.
So we got this cat.
Now we need to turn it into numbers because as we know, like, computers, they use numbers
and stuff.
So that's pretty easy.
We have all these various color spaces and things.
So we get like a number mapping for each color and then we run it through there and then
we get, you know, a map of numbers.
Okay.
So now we're good.
Now we have something computers can understand and we can transmit it.
So first we got to do is in the Internet.
For some reason, when they designed it, it didn't work.
When they designed the Internet, they didn't think it would be handling like, you know,
big chunks of data like cat pictures.
So it can only handle very tiny chunks of data.
So we split all of the data into all these kind of just kind of randomly sized different
things that we call them packets.
And then we transmit them over an unreliable ‑‑ a possibly unreliable medium, right?
And then they all arrive, maybe.
Maybe they arrive.
Maybe they don't arrive at some point.
And then we try to kind of copy.
We like cut and paste and stitch them back together to get the packet.
And then on the other end of the pipe, after all of this magic has happened, we get a pixel
perfect exact replica sent through the Internet of the cat that we started with.
There we go.
Yay, Internet.
Internet's great.
Okay.
So what's the problem?
I mean, the Internet's great.
We can look at cat pictures.
It brings us all a lot of love and joy.
Like who would ever want to try to stop this?
Well, robots.
Since the beginning of time, there's been a war between cats and robots.
No one knows why.
All we know is that robots have been programmed to hate cats, okay?
So here's how binary classifiers work, okay?
Robot looks at something.
It looks at the packets and it says, is that a cat?
Yes or no.
Those are all the options that we have.
That's why it's called a binary classifier.
That's the decision it's trying to make.
Cat, not a cat.
Okay?
Now, because they hate cats, if it is a cat, they replace it with a sad panda, okay?
All cats ‑‑ all cats are replaced by sad pandas.
Now if it's not a cat, don't care.
Don't care.
Just pass it through just exactly as it was, bananas, whatever.
It doesn't even ‑‑ they don't even know what bananas are.
They just know about cats and things that aren't cats because they're binary classifiers.
So don't care.
Pass it on.
Okay.
So.
The question is, how do we fool robots?
Okay.
So that we can transmit pictures of cats over the Internet without having them replaced
with sad pandas.
That's the question.
How do we fool robots?
Right?
Well, I think if you've been paying attention, remember I said pay attention like the first
five minutes, you already know the answer, right?
Right?
You got to make cats look like bananas.
And then robots don't care.
All right?
So here's the secret code to my talk.
Don't take a picture of this slide.
This slide is not on the Internet version of the talk.
This talk is just about cats and bananas.
So kittens are free speech.
Sad pandas are censorship of free speech.
Robots are filtering hardware that's made in America and then sold to companies all
over the world to make it so that people can't access the Internet and find out things about
like news about what's going on in their own country during elections and other critical
times like that.
Bananas are just messages that filtering hardware doesn't care about.
And then banana cats are free speech which is encoded so that it will get past the filtering
hardware.
Okay?
So, yeah.
So this is ‑‑ we're talking about some serious kind of deep stuff here, right?
This is like really important sort of stuff because the Internet needs to be free.
But, you know, I just kind of wanted to segue into this.
So now I hope that we're all at the same level, like we all are on the same page and understand
the code, right?
So now that you know the code, I can tell you about my project.
Dust makes cats into bananas in order to fool robots so that we don't have any more sad
pandas.
Okay?
All right?
So, yeah.
So that's the intro.
And now let's get, you know, into a little kind of some details here.
So how do robots see cats?
So robots can't see cats the way that you and I see cats where you look and you're like,
hey, it's a cat.
Right?
They only see the packets.
They see the grid of ‑‑ excuse me.
They see the grid of numbers.
Okay.
And then they have to use some kind of like statistical or like rule‑based ‑‑ because
they're robots, right?
They only know logic.
So here's one mechanism, right?
Which is you just look at the lengths of the packets, right?
It's all grouped into these kind of randomly sized packets.
You just kind of count like the first one is like 38 numbers in it and you say, you know,
if things are in this kind of configuration, then it must be a cat.
Now this probably sounds really dumb.
You know?
You think that's not going to ‑‑ that's not going to work.
That has nothing to do with whether or not it's a cat.
So we're going to do a little ‑‑ we're going to do a little audience participation
test to see if you guys can classify traffic based on packet lengths.
Okay?
Are you ready?
Here we go.
This is a graph of HTTP packet lengths.
Now that thing on the far right side, that is not the border.
That's actually a giant spike in the graph.
There's a giant spike over there.
If you know about TCP, that's because of the Nagel algorithm which takes little packets
and then just helpfully for you it bundles them into big packets.
So since that's not ‑‑ okay.
Not turned off in HTTP, you have kind of this spike in the largest possible size packets.
Okay?
Now this is HTTPS.
HTTPS disables the Nagel algorithm in TCP by setting the no delay option.
And therefore it doesn't have that kind of ‑‑ it has this like totally different statistical
‑‑ like it still has, you know, a lot of like fairly big packets.
It doesn't have that spike on the end.
And it has kind of this other spike kind of around like 400 or so.
So I don't really know why.
I just look at the graphs.
Okay.
So I have just showed you two different graphs.
Now I'm going to ask you ‑‑ I'm going to show you a chart.
I'm going to ask you if you can guess which one it is.
Okay.
So raise your hand if you think this is a chart of HTTP.
Okay.
Raise your hand if you think this is a chart of HTTPS.
Okay.
Congratulations.
You are all robots.
It was neither.
It was dust.
My project.
Okay.
It was something pretending to be HTTPS.
So yeah.
So it did a pretty good job, right?
I kind of tricked you, though, because I didn't have that option of like is this something
pretending to be HTTPS.
You might have picked that because that's kind of an obvious choice since that's what
we're talking about.
So yeah.
So packet links work as a way to determine if something is one protocol or another protocol.
And the reason that we care about this is because these days the way they block the
Internet is they don't say, hey, you're looking at this thing that we don't want.
We don't want you to look at.
So we're going to block it.
They say, hey, you're using BitTorrent, blocked.
Hey, you're using Tor, blocked.
You're using SSL, blocked.
You're using a VPN, blocked.
They just block it by the protocol regardless of what you're doing.
And that's crazy because you could be doing all kinds of things.
But you know, if they can't look at what you're doing to determine whether or not they
like it, they're just going to go ahead and block it by default.
And so they do it based on protocol.
So like they're ‑‑ for instance, there are situations in which SSL is not going to
be blocked and you can only use unencrypted HTTP.
Well, that's okay if you can make your traffic look like unencrypted HTTP even if it's not.
So yeah.
So dust removes packet length information.
But it doesn't just randomize it.
It randomizes it according to a target distribution of whatever you want.
So you pick a protocol and dust will make your packet links look like that protocol.
Any protocol doesn't matter.
Just give me some sample traffic.
I'll sample it and I'll make a profile and I'll make it look like that.
So here's one of the ‑‑ like I said.
These are like kind of tools that I've made for looking at deep packet inspection hardware
and trying to figure out how it's doing classification so that we can, you know, circumvent that classification.
I made this tool called Shaper.
You give it a model of a protocol, statistical model.
So for instance, like a model of like what packet links.
It then does the trick before and makes traffic that looks like that.
Just infinite traffic that looks like whatever you want it to look like.
And then we pass it through and we say, hey, is this such and such or not?
And then we get the answers back.
And then we can tell how well the different hardware is at classifying protocols.
And then once we can do that, we can get better at making encodings that hide stuff from the
classifiers.
And so that's one of my open‑source tools.
You can use it.
If you have some hardware, you can like throw traffic at it and test it and see how it's
doing classification.
Okay.
So second type is ‑‑ it just looks and says, hey, there's some statistical properties
of this traffic.
Like, for instance, I see a whole bunch of sixes.
I'm going to count the number of sixes.
If there's like a bunch of sixes, then that means that it must be, you know, whatever.
It must be some particular type of traffic.
So here's some examples of that.
So this is an English dictionary.
And I looked at the probability of different bytes to occur in that dictionary, right?
So the one on the far left is just new line because it was just a list of words.
So don't pay attention to that.
That's just ‑‑ I didn't clean the data because real data is dirty.
So I'm showing you the dirty data.
And so there's ‑‑ yeah.
So this is the main thing.
This is lowercase letters of the alphabet, right?
So you can see there's definitely a spike.
To the left is a little spike that's uppercase letters.
There's a lot of uppercase letters in the dictionary, more than you would think, but
a lot less than lowercase letters.
So yeah.
So that's ‑‑ clearly there's, like, statistical sort of stuff.
If you look at, like, a U.K. English dictionary, it's a slightly different sort of thing.
This is HTTP.
Oh, my gosh.
It's the same spike.
Why is that?
It's because HTTP traffic actually has a lot of data.
You know, I like ASCII letters in it as well.
Like HTML elements are often lowercase letters, a little bit of a bigger spike in the uppercase
letters.
But yeah.
So you can see this bleeds through.
Like, we know that this was English HTTP traffic, or at least, like, HTML HTTP traffic, right?
We know this was not images because we can just look at this distribution, right?
So I feel like a lot of people think that, you know, if you kind of wrap your traffic
in something, it hides it.
But a lot of stuff happens.
It actually bleeds through.
Here's HTTPS.
Oh, my gosh.
It has the same spike.
Why does HTTPS, which is encrypted, have the same spike in English letters?
It's because SSL is encrypted, but the header is not encrypted, right?
And the header has a bunch of information in there that uses normal, like, English letters,
like the name of the Web site and stuff like that, the SSL common name, as they call it.
And that's how they get you with the SSL.
That's how they get you with the encrypted traffic is they look at the unencrypted headers,
and then it's actually encrypted.
It's actually super easy to tell what protocol you're using, even if you're using an encrypted
protocol, if there's an unencrypted header.
So I think people have this idea, let's just encrypt everything with SSL.
Well, that doesn't work because you can tell it's SSL and people just block SSL.
So yeah.
So Dust fixes that, too, right?
Dust removes the statistical content information.
I use this thing called reverse Huffman encoding where I encrypt everything to make it random
and then I reverse Huffman encode it to make it not random, to make it just whatever.
Right?
So if you say the only characters ‑‑ the only bytes you can use are F and A, I will
give you a stream of just Fs and As that encodes your traffic.
Whatever you want, whatever distribution you want, I'll make it look like that.
And then final, and this is ‑‑ I know you guys are going to be like, that's stupid.
No one does that.
But yeah, this is the most popular way of classifying traffic.
You look for a sequence of bytes at a particular offset in the file.
And then that's it.
You see this, like, for instance, HTTP traffic, you know, it starts with, like, HTTP get,
HTTP post.
They just look at the first four bytes.
If it's HTTP, they classify it as HTTP traffic.
That's it.
And that is, like, 90% of all DPI classification that's, like, actually deployed and used for
censorship is just doing that.
So yeah.
So we remove that.
Right?
Because, you know, that's not going to work.
All right.
So along those lines, I have this other tool that I made that's part of the dust kind of
suite of tools, which is for looking to figure out what these byte sequences are because these
signatures ‑‑ I call them signatures ‑‑ are not public.
Like, they don't want to tell you what bytes they're looking for because it would make
it easy to obfuscate your traffic, right?
So if you have some DPI hardware, I have this tool that will take some sample traffic and
then replay it with all these different variations where it blanks out certain bytes.
And then you can look at the results and you can find the exact string that they're
looking for.
And you can do the ‑‑ you know, again, you can do that for any protocol.
Okay.
So to break it down for you, what dust does is if you define a set of properties that
deep packet inspection hardware is looking at to filter, and you define, you know, like,
which things go in which category based on those rules, then for whatever property that
is, dust will randomize that property to remove all information.
And then you can look at the results and you can find the exact string that they're looking
for.
And it randomizes it according to a probability distribution to force the classification to
whatever category.
So you tell me what categories your hardware has and I can make arbitrary traffic get put
in any of those categories.
The reason you want to do this is because you want to get into the category that's not
being blocked, whatever that is, right?
Like there was a recent instance of an adversary was blocking everything except for HTTP and
HTTP connections could only be 60 seconds long and then they were automatically closed.
And so a lot of protocols had trouble with that.
Dust says, fine, 60 second HTTP connections, let's do it.
And then encodes all the traffic that you have over, you know, that protocol.
So yeah.
So basically if you let any messages through, then you have to let all messages through
because we'll just encode into the set of messages that are allowed.
And then the ultimate point of all of this is I have this message server that you give
it arbitrary messages.
It encodes them to look like bananas.
They're passed through.
And then people are reunited with the cats that they love.
And that's really what it's all about is just letting people get to the content they
want to get to, post what they want to post, read what they want to read, and just have
free speech on the Internet.
Cool.
So that's the end of my linear part of my talk.
And now I have several bonus slides depending on how much time we have.
And I think ‑‑ yeah.
I think I've run through those pretty quick.
So I'm going to go ‑‑ yeah, I'll just go through them.
And then when we do Q&A, maybe some of the questions will also be related to these slides.
Okay.
So sometimes people ask me about various other projects and how, like, Dust is different
from these other projects.
And I don't really think of them as competitors.
Like we ‑‑ like, I mean, people are going to choose they're going to use one kind of
encoding or another for their traffic to get it past this filtering hardware.
But just use whatever works.
I mean, all you want to do is get past the filtering hardware, right?
So if something works, do it.
And if it stops working, then, you know, switch to something else.
So I worked with Tor on FS proxy, which is their obfuscating protocol.
And so that's an example of a protocol where it just obfuscates, right?
Like it just makes everything look totally random.
And that's good.
That's pretty good.
That will get you past a lot of things.
But some of the hardware now will actually ‑‑ I don't know.
It will actually flag stuff as random looking, at which point you can make a custom rule
that says, hey, if it's random looking, block it.
If you can't classify it, that's okay.
Just block everything that has, like, high entropy.
Like if you guys have heard about, like, the entropy attacks, those are really awesome
attacks that work really well.
They're not really widely deployed.
But you can custom configure them in some of the hardware.
So that's the issue with just obfuscating stuff.
You need this second layer where you shape it to look like the stuff which is white listed.
A lot of people are doing a lot of research on mimicking specific protocols, especially
HTTP.
People are just trying to make stuff that hides ‑‑ like, steganographically hides
information in HTTP.
So the problem with that approach is that people always choose the most common protocols,
the ones that they think, like, no one will ever block this protocol because it's too
important.
People usually say that about SSL, and now it's totally been blocked.
So people are really focusing on HTTP.
The problem with that is that the DPI hardware has the most visibility into HTTP of any protocol.
There are actually whole boxes that just do HTTP interception and do, like, semantic
parsing of all of the headers and all of that kind of stuff.
So you have to do a lot of work to look like HTTP.
In fact, there was this paper recently called the parrot is dead in which they talk about
that.
They're pretty sure that given any kind of traffic that mimics some other kind of traffic,
they can make a test exist where they can differentiate the two because there's going
to be a difference between, like, your HTTP implementation and, like, a real HTTP implementation.
So people are trying to do this crazy stuff where they're, like, trying to get, like,
an actual browser.
Like, they're trying to get Firefox and try to make Firefox, like, load pages, and then
they encode, like, information in the way, like, which pages you choose and the timing
and stuff.
And that's fine.
It's just, like, a very simple thing.
It's a very slow protocol.
And you don't need to do any of that because, like I said before, the DPI hardware is just
most of the time saying are the first four bytes HTTP?
And then that's all you need to do.
A lot of the hardware only looks at the first packet because they're trying to scale and
so they're basically they're cheating in their design, right?
Like instead of, like, looking at all the packets because they want to be able to push
more throughput and be able to tell the people that are buying it, like, oh, yeah, we can
handle your whole country's traffic and, you know, you don't need that many boxes will be
fine.
They just look at the first packet and they classify it and they just, like, forget it.
It's been classified so they just stick with that classification.
I was talking to a DPI vendor who said that they look for some protocols they have to
look at, like, 20 packets, oh, no, 20 packets before they can classify it.
So it's a lot easier than trying to actually, like, be exactly like this protocol.
And then there's a really cool project called format transforming encryption that you give
it a grammar for a protocol.
Like, for instance, you say, like, HTTP.
Yeah.
Or, like, FTP or, like, SMTP.
And then it will generate random messages that conform to that grammar.
That's a pretty cool project.
So I checked that one out.
So the difference is in what I'm doing is that I'm not writing a protocol.
Like opFS3 is, like, the Tor's current protocol for obfuscation.
You look at FTE.
That's kind of a protocol engine.
But most people are just thinking let's make one protocol that can never be blocked.
And I got to tell you that doesn't exist.
There's no protocol that cannot ever be blocked by anybody.
It just depends on your settings.
Like your attacker, your adversary is going to have some configuration on their hardware
for block this, don't block this.
And it's going to be different for everybody.
There is no one protocol.
So instead I wrote a protocol engine where you just ‑‑ instead of updating it with
each revision when it gets blocked, you just change the settings.
Like you say, okay, before we were making traffic look like HTTP.
Now let's make it look like ‑‑ let's do some UDP‑based thing, you know?
Let's just get crazy.
Let's use UDP.
Let's make it look like Skype, whatever.
And then, you know, if they block that, then again, just, you know, just switch it up.
Switch it up every day.
In fact, don't even just mimic protocols.
I have this thing that I can't really convince anyone is a good idea that I think is awesome
which I call chimeric protocols where you take, like, two protocols.
You take, like, I don't know, like SMTP and, like, NTP.
And then you just kind of, like, smoosh them together and you get this protocol that people,
like, I don't know what that is.
Right?
And just keep them busy.
You know, they got guys, right?
They got to configure this hardware.
They first have to notice your anomalous traffic, right?
Then they have to figure out what you're doing.
Then they have to make a configuration and then they have to make sure that it, like,
evenly splits out your traffic from, like, the legit traffic.
So you know, just, like, just keep it rolling.
In fact, you could even with just use a probability distribution, you could make up just random
distributions.
You know, you could be, like, in this protocol everything is always going to be five bytes
long or, you know, like, 1,400 bytes long.
I don't think there's any protocols like that, you know?
So yeah.
Another thing is my thing is purely statistical because that's how ‑‑ they actually look
per packet is how the classifiers work.
So my stuff is per packet.
In the parrot is dead paper they actually reference my work and they say I think we've
determined in this paper that packet‑based stuff like dust.
It's just never going to work.
And it's, like, right.
It's not going to work against a bunch of CS professors and all of their grad students
in a lab looking at, like, just, like, two different, like, PCAP files, sure.
But against the actual deployed hardware it works awesome.
I know because I have the hardware and I pass it through there and it works awesome.
So I think that's kind of, you know, that's kind of one of the differences there.
And thank you.
Thank you.
All right.
And so another difference is, like, with FTE, format transforming encryption, it's a great
project.
You need a protocol specification so that you can follow that grammar.
With dust you just give me some sample traffic and I'll just build a model from that.
In fact, the best thing is you give me some sample traffic of traffic that was blocked
and some sample traffic of traffic that wasn't blocked and I can from that make you a protocol
that will be guaranteed to not be blocked.
Well, not guaranteed.
But it won't be blocked.
Without having to even know ‑‑ I don't even need to know what protocol it is.
I just need you to give me the PCAP files and I just process them and then we're done.
Another thing is, so a lot of people that are doing these specific protocols like HTTP
modeling, they model the protocol and they say, what does the protocol look like?
Let's look exactly like this.
What I do is I model the filtering hardware and I say, what does the filtering hardware
think that HTTP looks like?
Let's look like that.
Right?
And then not do any more work than necessary so we get maximum efficiency while still definitely
getting past that hardware.
You give me some different hardware, I might come up with a different protocol.
I think this all comes down to I'm aiming for a realistic threat model.
I want to base my threat model on what's deployed and what's being used to censor countries.
And then one more thing I just added right before the talk is that there's no shared
secrets.
Everything is totally public.
The source code is out there.
You can get it and even the protocol doesn't have any kind of shared secrets or anything.
So you can know that people are running dust.
It doesn't help you figure out who is running dust because the traffic by definition looks
like the traffic that you don't care about, right?
So even if you download it, you run your own experiments, unless you know what settings
people are using, it won't help.
But even if you know what settings ‑‑ like the battle is ‑‑ you have to make a
better rule for your filter that can tell between the mimic traffic and the real traffic.
So it's no longer like a war of technology.
It's like a war of who has the better information, like who has the better ‑‑ the better
models.
So talking about threat models, so in the academic world, the threat model hierarchy
of threats is if someone just published a paper and it won best paper award, that's
the adversary that you need to attack was the adversary in that paper, right?
And then otherwise ‑‑ like if there's ‑‑ like a real threat, you're going to have to attack.
If it's a recently published attack, you should defend against that.
Otherwise ‑‑ if there was an attack published before 2003, no one cares.
No one is working on that in academic research at all.
And cool.
So that's kind of my issue with academic stuff is they're really good at classifying
traffic in the lab.
But I mean, who cares?
Because until it makes it to hardware, until it's deployed, until it's being used for censorship,
it doesn't really matter.
I have a slide about open source threat.
And I just want to say ‑‑ I mean, if anybody ‑‑ this is my experience working
on free net, working on open source project, is that the biggest threats, the number one
threat is whatever you come up with that you can think of, that's like, oh, that's
what I defend against.
Because like I thought of it and so it's like probably pretty serious attack.
And then secondly is like if someone on the mailing list comes up with it, then, you know,
it's pretty bad.
Or if it's on Reddit.
Like if somebody attacks your system on Reddit, like in a Reddit thread, and they're like,
your system sucks.
It's totally broken.
I know because I broke it.
It's a bad attack.
Then that's what people defend against.
And then finally everybody always adds plausible deniability as a thing.
I know we did it in free net.
You know?
So it's like I've been there.
Everybody just always thinks you've got to add plausible deniability.
And I think that this is a bad road to go down as well.
So my threat model is based on is this attack actually being done in the wild to censor
traffic?
A lot.
And so that would be an example of like the static packet.
The static by sequence matching.
That's like number one thing.
So like if you don't defend against that, then we don't even need to talk about it.
And there's actually still obfuscating protocols that begin with a magic number in the handshake.
And so if you just put that magic number to the filter, then that protocol is gone.
And then, you know, if you see it occasionally, that's good, too.
We'll do that.
And then finally if the capability is in hardware but just hasn't been used, then that's like
lowest priority.
But I'll still do that.
And there's some like really awesome hardware.
I met a lot of people actually this weekend that were telling me about some DPI hardware
that sounded like totally sweet.
No one is using it.
But if anybody ever buys it ‑‑ so one of the things about DPI hardware, it's like
old.
It's really old.
No one ever upgrades.
So a lot of these countries that are filtering, they're using like 10‑year‑old hardware.
So that's the first thing is like the 10‑year‑old hardware is the first thing that we need to
prevent against.
And you would be surprised.
The protocols that are coming out.
That fall.
Instantly when thrown against 10‑year‑old hardware.
Because they're reading the papers or they're going on the mailing list rather than looking
at the actual hardware.
Let me flip through, see if I have some more slides here.
Let's see.
Yeah.
Okay.
That's a good question.
So, yeah.
So you have to have a client and you have to have a server.
And they both need to be speaking the protocol.
You need the public key of the server.
You need that because I need to have ‑‑ to be able to do a handshake where we don't
have to communicate anything that's not purely random bytes.
So let me go ‑‑ I have ‑‑ let's see ‑‑ yeah, I won't really get into
the key exchange.
I don't have a lot of time.
But the key exchange and everything is all purely random.
So you need to have the public key ahead of time.
So when you find out the address of the server, you need to find out its IP, its port, its
public key.
And then also the configuration for what specific protocol you're going to be speaking.
So it all needs to be out of band in the invitation.
And so I know that's not ‑‑ that's kind of not the way that people usually do it.
People like to do these like you connect and then you just handshake everything like
right there.
That's kind of like a more popular way to do it.
And I just feel like that way doesn't work.
You need to have a little bit of information transmitted out of band beforehand in order
to have all of the properties that we want to have.
Let's see.
Okay.
Okay.
Let's do questions.
And if slides ‑‑ there are slides that are referenced by questions, that's fine.
Anybody got any questions?
Oh, we got a mic.
That's good.
This is a big room.
Come to me.
I don't have a long cord.
And shockingly no wireless here.
What?
A DEF CON?
So how do we ‑‑ what about the
How do we run a Dust server to help out? Is there a community set up or such or EC2 instances
or anything like that? How can we make those endpoints that people can connect to?
Right. So that's a good point. So Dust right now is not actually a service. It's a protocol
and it's like an implementation of that protocol which is designed for other people to use.
So like for instance with Tor, I worked with them on Aavev's proxy which is part of their
publicable transport system where you can basically make ‑‑ anybody can make a
new transport for Tor and so that's kind of one of the targets is like a Tor wrapper that
uses this. And then also I'm trying to make it into like a library where you can use it
like just in your own kind of protocol. There's no currently like system for just doing like
open proxies that are based on Dust. I think that that's not really the model that I want
to go with just because I know from the
Tor guys from way back when like how much work it is to run a community of volunteer
nodes. Well, Freenet we had that issue as well. Freenet was actually pretty low maintenance.
People just run it. There wasn't a lot of coordination. But yeah, so right now this
is ‑‑ let me go to the slide on whether or not you should put real traffic on it which
is no, don't put real traffic on it because this is a purely, purely experimental sort
of thing. Yeah. So yeah.
There's no ‑‑ I don't have a good answer for that yet. But that's a good question.
I'm going to work on that. Okay. I guess this is more of a general question
for all obfuscating protocols. But couldn't the attacker just notice that you're only
communicating with one machine all the time and it's always HTTP and you never get anything
blocked and then just block all access that way to that machine?
Right, right. I see what you're saying. So you're talking about like your connection
patterns being anomalous, right? Like you're making long‑lived connections to a system,
single machine. So that's one of the things I'm going in the next version that I'm working
on is being able to split your traffic over multiple connections to multiple machines,
one conversation. I've already got it where like some protocols actually use multiple
different ports. Like if you look at open VPN, it uses 443 and like 1194. I already
have that as part of the statistical model where you can say, yeah, use like 80% on 443
and 20% use like 1194, right? So you can take that to host, too. You can
be like split your traffic among this set of hosts with this probability distribution,
use these ports with this probability distribution. So, yeah, I'm totally working on that. Also,
I'm working on a thing where you can split your traffic over simultaneous TCP and UDP
conversations using different profiles, different protocols with different hosts and it all
just gets kind of funneled back together into one stream on the other end. That's a lot
of work, though. So it hasn't come together yet. It's just a lot of bookkeeping and stuff.
Next step, though. Yeah, that's the next step.
So it seems like the obvious escalation for the hardware manufacturers is to just
move up the chain and start classifying distributions of bi-grams, tri-grams, like hashes of tokens
in HTTP. Have you seen any evidence that they're moving that way or are you sort of
banking on the fact that that's like a lab CS world theoretical attack and not likely
to be deployed in practice? Well, so to come back to the basic principle
of Dust, if you define a property ‑‑
I will randomize over that property. So if you move from a first order probability model
for content where you're just looking at individual bytes to looking at bi-grams or
tri-grams and that's deployed and I see that, I will simply randomize on the bi-gram and
tri-gram level. And I can do that a lot faster than the hardware people that need to do all
that stuff, test all the stuff and get people to buy it and then get people to roll it out.
I could do that ‑‑ I could do that today. The only reason I haven't done it is because
‑‑ it's not going to work. So I'm going to have to do it again. I'm going to have to
And it's not going to happen again because my PC is not deployed and also, like, today
specifically I'm really busy doing some of the DEF CON contests.
So I, you know, yeah. We're not done yet. Stop clapping.
So how do you specify what's allowed through? Do you have the client email out of band
some PCAP data for things that they were able to do and what they weren't able to do? What's
What's the actual details of how that gets specified?
So there's kind of two parts there.
There's how do I make a model of a protocol and then how do we communicate that model
to the client so they can connect to the server.
So in terms of modeling the protocol, I have some tools that take PCAP files and then actually
boil them down into, like, a statistical ‑‑ like, it takes out all of that individual package
and just gives you the statistical model and it makes that into, like, a tiny little file
that you can e‑mail to somebody.
And you bundle that up into what I call, like, an invite packet, which has the IP and the
port and the protocol configuration information all in one thing.
So all you need to do is tell Dust, here's my invitation, and then it will connect to
the server and do everything right.
And so in terms of how you make those, what I do is I have deep packet inspection hardware
and I look at what gets through and what doesn't get through.
Now obviously it depends on how you configure it, like, what kind of traffic you, like,
are against.
So what I do is I look at real world instances of filtering, I find out what they're using,
I get that hardware, I configure it to, like, reproduce the reported behavior, and then
that's how I try to make a realistic model.
Which brings me to something I wanted to say about contribution.
Here's a bunch of ways you can contribute.
Everything is written in Haskell and my Haskell to C is really weak, so if anybody knows
Haskell to C, I could really use some help making my Haskell to C bindings not suck.
And then also if anybody has any DPI hardware, that would be cool.
Because I have some, but I don't have it all.
In particular, I need some Huawei.
So if anybody's got any Huawei gear that they want to let me, like, send some packets through,
you can help save the Internet from being censored.
So you know.
It's, like, on the DL.
You're saying Huawei is, like, maybe a security problem there?
Was that?
You're saying Huawei is a potential security risk?
For my project?
No, in general.
In general?
I wouldn't say in general.
I mean, they have good stuff.
They have good stuff.
They're really good at filtering stuff.
So I don't know if my stuff works against them.
I don't know if it works against Huawei or not because I don't have a Huawei box.
Yeah.
Anyway.
More questions.
Do you think it's possible to put a DL client in the filter so the message
can be decrypted?
I mean, automatically.
Yeah, we can use a key, I mean, exchange there.
But the protocol, I mean, is relatively more constant than that.
So if we just reverse engineer that.
The protocol, I mean ‑‑ I mean, reverse engineer my protocol?
Uh‑huh.
Oh, you don't need to reverse engineer it.
You can just download the source code.
So it's, like, it's right there, you know?
Yeah, I was thinking, I mean, just to put a ‑‑ just trying to put a defense
mechanism in the filter.
Like so just things can be automatically decrypted to ‑‑ yeah, just, like, put a client
in the filter.
So you can put a client in the filter so you can understand the meaning of what has
been passed through.
I don't totally understand your question.
So let's talk after and then I'll get it.
You mentioned some academic work which sort of questioned whether in the long, long run
your protocol can fundamentally work because eventually they can adapt to your protocol.
Can you please give more details about it?
Yeah.
So that was the Parrot is Dead paper in which they say that packet‑based protocols ‑‑
packet‑based approaches to obfuscation won't work because they've already got some stuff
that they have done where they look at, like, the whole connection and then they're able
to classify stuff a lot better.
Which makes sense, right?
Like if you're not looking at one packet, if you're looking at all of the packets, you
have a lot more information that you can use to classify.
So yeah, sure, that's true.
Here's the thing, though.
If you are looking at the whole sequence of all of the packets, unless you delayed ‑‑ well,
not even then.
That means you passed them.
That means you passed the packets on to the server and then you got responses and you
recorded the whole conversation and then you classified it.
I won in that case, right?
The message got through.
Now maybe you had to burn that IP, maybe that IP is blocked now and you have to go to a
new IP because they said, oh, you're doing crazy stuff so we're going to block it.
That's already a problem, right?
That's already a problem that Tor deals with all the time.
You have to churn through new IPs all the time.
So I consider victory to be any time that I get the message through.
I don't care about anything else.
I don't care about people reading the messages.
I don't care about them decrypting the messages if it's afterwards and they couldn't use that
information to block the packets.
So we just have different, I think, goals.
The academic people are like, can we classify traffic, yes or no?
My question is, can they block the traffic, which they do through classification?
Okay.
So this will be the last question.
If anyone else wants to talk to our man here, we're going to take him over to the Chill
Out Cafe.
So one more.
Okay.
Only one more, so I'll make it count.
Can you multiplex traffic across multiple protocols and multiple endpoints is the first
part.
And the second part is are you IPv6 ready?
So good questions.
The first part, that is in the next version I'm working on, is multiplexing over multiple
protocols, multiple IPs, multiple ports.
And also between TCP and UDP, which nobody is doing, so I think that's cool.
Most people just don't like UDP.
I don't know why.
It's rad.
And IPv6 ready.
It's funny you say that.
I actually, the first version of Dust was IPv6 only, and people had to talk me down
from that.
They had to be like, look, you guys, look, you guys, look, Brandon, like, people don't
have IPv6.
I'm like, well, they better get it.
So the new version ‑‑ thank you.
Yes.
IPv6 is cool.
So the new version, I actually have just done IPv4, but I'm going to add IPv6 obviously
because actually one of the best ways to avoid deep packet inspection is use IPv6 because
they haven't gotten around to implementing most of the stuff for IPv6.
Another great thing you can do is there's a thing called Torito, which is like IPv6
over IPv4, UDP with like built‑in hole punching and stuff, and it's like really sweet.
It's actually built into Windows 7, so if you have Windows 7, you already have it.
You can just go to IPv6 addresses.
That's another thing where they just, like, don't know what that traffic is.
So you just use that and then everything is fine.
There's a lot of, like, you know, cool little shortcuts to getting your traffic past the
filters by just using weird ‑‑ like use a weird protocol, you know, stuff like
that.
All right.
Thank you.
So, yeah, I'd be happy to talk to everybody.
See you guys at the Q&A room or if you just see me around, you know, let's hang out.
Let's get a beer.
Invite me to some parties.
Cool.
Thank you.
I appreciate it.
Thanks for your time.
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